MetDecode: methylation-based deconvolution of cell-free DNA for noninvasive multi-cancer typing

Antoine Passemiers et al.

Presenter: Tony Liang


October 31, 2024

Background

Circulating cell free DNA1
  • Circulating free DNA (cfDNA) are DNA fragments released into bloodstream

  • Fraction of cfDNA could be released from cancer or tumor cells are circulating-tumor DNA (ctDNA)

  • Contains genetic and epigenetic changes, and could reveal the cells from which is originated

    • Identify different types of cancer

Detecting the origin of cfDNA

Current cfDNA screening test can detect presence of abnormal signals but cannot tell tumor’s origin or cancer type or tissue of origin (TOO)

  • Computation methods use epigenetic markers like methylation profiles to deduce origin of cfDNA fragments
    • “Deconvolute” plasma cfDNA composition
    • Varying approach, probabilistic, linear model, matrix factorization, etc.

Existing methods limiations

  • Cannot deconvolute multiple cancer tissues
  • Do not account for missing variables due to incompleteness of atlas
  • Do not allow full deconvolution of all cfDNA components and estimate cell proportion only

MetDecode

In some sense, “combined” existing methodology like nonnegative least squares, matrix factorization etc.

  • Also built a newer reference atlas of tissue-specific methylation markers for 4 different cancer tissues
    • Breast, ovarian, cervical and colorectal
  • With option to extend reference atlas with unknown methylation patterns on-the-fly

The main deconvolution algorithm

\[ f(A) \quad = \quad \sum\limits_{i=1}^n \sum\limits_{k=1}^p \quad W_{ik} \quad \Big| \underbrace{R_{ik}^{\text{(cfdna)}}}_{(1)} - \underbrace{\sum\limits_{j=1}^m A_{ij} B_{jk}}_{(2)}\Big| \]

  1. Methylation ratios \(R^{\text{cfdna}}\)
  2. Reconstructed matrix, which approximates \((1)\)

Some math behind how MetDecode address unknown cell type contributor

To account unknown contributors in cfDNA mixture by adding \(h\) extra rows to \(R^{\text{(atlas)}}\)

\[ R_{hk}^{\text{(atlas)}} = \begin{cases} R_k^{lb}, \quad e_k > 0 \\ R_k^{ub}, \quad otherwise \end{cases} \quad \text{where} \quad e_k = \text{median}_i \quad \Big( -R_{ik}^{(cfdna)} + \sum\limits_{j} \alpha_{ij} R_{jk}^{(\text{atlas})} \Big) \]

Evaluation metrics

  • Pearson Correlation Coefficient and Mean Squared Error to evaluate MetDecode estimations

  • Accuracy to evaluate multiclass cancer TOO prediction, and Cohen’s kappa to adjust for multiclass nature

  • Also, looked into limit of detection using in-silico mixtures of tumor gDNA and healthy cfDNA

Result 1

Coverage-based weighthing in MetDecode

Evaluation of the coverage-based weighting used in MetDecode


  • Ran on 50 simulation runs, each containing \(5000\) simulated cfDNA samples.

  • Then computed Pearson Correlation Coefficient of different deconvolution algorithms

  • Upon averaging all correlation coefficients, MetDecode was significantly higher than all other approaches

    • BUT not for looking at blood cell types only

Results – What authors get back 2

abc

Results

Cancer type prediction comparisons based on highest cancer contributors
  • MetDecode with 1 unknown contributor performs best based on Cohen’kappa

  • All methods do equally poor for \(< 50\%\) accuracy when predicting all samples

  • Closer performance when looking at those \(19\) samples with tumor fraction \(> 3\%\)1

    • This is its \(84.2\%\) accuracy of correct TOO in \(16/19\) cancer cases

Conclusion

How could one utilize cfDNA?

cfDNA epigenetic signatures can be used to deduce TOO or cancer type

MetDecode is an algorithm that estimates contributions and type of cancer in cfDNA sample

  • It models unknown contributors not present in the reference atlas

  • And accounts for coverage of each marker region to alleviate potential sources of noise

Limitations and Future Direction

  • Limited size of cfDNA samples for different cancer types

    • Total 93 samples, 4 being Cervical, 13 being Ovarian, rest are breast and colorectal
  • Another limit

  • Deconvoluting and defining the TOO will aid the oncologists in identifying the tumor and direct treatment

    • Specially when invasive examinations and radiological investigation are not ideal

Some comments

  • Why weighting approach only improves deconvolution accuracy on cancer components only and not in blood cell types?

  • Cell type deconvolution still seems hard (low accuracy in terms of predicting cancer type), what is the next step?

  • Aside, can you always just combined existing approach to get a “new” method out?

Thanks!

Reference

Adalsteinsson, Viktor A, Gavin Ha, Samuel S Freeman, Atish D Choudhury, Daniel G Stover, Heather A Parsons, Gregory Gydush, et al. 2017. “Scalable Whole-Exome Sequencing of Cell-Free DNA Reveals High Concordance with Metastatic Tumors.” Nature Communications 8 (1): 1324.